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Abstract 


Distances to compact sets are widely used in the field of Topological Data Analysis for inferring 
geometric and topological features from point clouds. In this context, the distance to a probability 
measure (DTM) has been introduced by |Chazal et al. (2011b I as a robust alternative to the distance 
a compact set. In practice, the DTM can be estimated by its empirical counterpart, that is the 
distance to the empirical measure (DTEM). In this paper we give a tight control of the deviation 
of the DTEM. Our analysis relies on a local analysis of empirical processes. In particular, we show 
that the rate of convergence of the DTEM directly depends on the regularity at zero of a particular 
quantile function which contains some local information about the geometry of the support. This 
quantile function is the relevant quantity to describe precisely how difficult is a geometric inference 
problem. Several numerical experiments illustrate the convergence of the DTEM and also confirm 
that our bounds are tight. 


1 Introduction and motivation 


The last decades have seen an explosion in the amount of available data in almost all domains of 
science, industry, economy and even everyday life. These data, often coming as point clouds embedded 
in Euclidean spaces, usually lie close to some lower dimensional geometric structures (e.g. manifold, 
stratified space,...) reflecting properties of the system from which they have been generated. Inferring 
the topological and geometric features of such multivariate data has recently attracted a lot of interest 
in both statistical and computational topology communities. 

Considering point cloud data as independent observations of some common probability distribution 
P in many statistical methods have been proposed to infer the geometric features of the support 
of P such as principal curves and surfaces Hastie and Stuetzle ( 1989[), mu ltiscale geometric analysis 
Arias-Castro et al. (2006), density-based approaches Genovese et al. (2009) or support estimation, to 


name a few. Although they come with statistical guarantees these methods usually do not provide 
geometric guarantees on the estimated features. 


On another hand, with the emergence of Topological Data Analysis (Carlssonj 2009), purely geo¬ 
metric methods have been proposed to infer the geometry of compact subsets of M'^. These methods 


aims at recovering precise geometric information of a given shape - see, e.g. Chazal et al. ( 2009a|b ); 
Chazal and Lieutier (2008); Niyogi et al. (2008). Although these methods come with strong topological 


and geometric guarantees they usually rely on sampling assumptions that do not apply in statistical 
settings. In particular, these methods can be very sensitive to outliers. Indeed, they generally rely on 
the study of the sublevel sets of distance functions to compact sets. In practice only a sample drawn 
on, or close, to a geometric shape is known and thus only a distance to the data can be computed. The 
sup norm between the distance to the data and the distance to the underlying shape being exactly the 
Hausdorff distance between the data and the shape, we see that the statistical analysis of standards 
TDA methods boils down to the problem of support estimation in Hausdorff metric. This last problem 


has been the subject of much study in statistics (see for instance Cuevas and Rodriguez-Casal 2004 


Devroye and Wise 1980 1 Singh et al. 2009). Being strongly dependent of the estimation of the support 
in Hausdorff metric, it is now clear why standard TDA methods may be very sensitive to outliers. 
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To provide a more robust approach of TDA, a notion of distance function to a measure (DTM) in 
has been introduced by Chazal et al. (2011b | as a robust alternative to the classical distance to 
compact sets. Given a probability distribution P in and a real parameter 0 < n < 1, Chazal et al. 


(2011bI generalize the notion of distance to the support of P by the function 

5p^u : X G I—)• inf{t > 0 ; P{B{x, t)) > u} 


( 1 ) 


where B{x,t) is the closed Euclidean ball of center x and radius t. For u = 0, this function coincides 
with the usual distance function to the support of P. For higher values of u, it is larger than the usual 
distance function since a portion of mass u has to be included in the ball centered on x. To avoid issues 
due to discontinuities of the map P — 6p^u, the distance to measure (DTM) function with parameter 
m G [0,1] and power r > 1 is defined by 


/ 1 /-m \ !/?■ 

dp,m,rix) : X G I-)' f — J 6py^{x)du] 


( 2 ) 


It was shown in Chazal et al. (2011b) that the DTM shares many properties with classical distance 


functions that make it well-adapted for geometric inference purposes (see Theorem]^ in Appendix [A| . 
First, it is stable with respect to perturbations of P in the Wasserstein metric . This property implies 
that the DTM associated to close distributions in the Wasserstein metric have close sublevel sets. 
Moreover, when r = 2, the function dp ^2 semiconcave ensuring strong regularity properties on the 


geometry of its sublevel sets. Using these properties, Chazal et al. (2011b| show that, under general 
assumptions, if P is a probability distribution approximating P, then the sublevel sets of dp ^ 2 provide 
a topologically correct approximation of the support of P. The introduction of DTM has motivated 
further works and applications in various directions such as topological data analysis Buchet et al. 


(2015a|, GPS traces analysis Chazal et al. (2011a|, density estimation Biau et al. (2011), deconvo¬ 


lution Caillerie et al. (2011) or clustering Chazal et al. (2013) just to name a few. Approximations 


generalizations and variants of the DTM have also been recently considered in Buchet et al. (2015b) 


Guibas et al. (2013); Phillips et al. (2014). However no strong statistical analysis of the DTM has not 


been proposed so far. 

In practice, the measure P is usually only known through a hnite set of observations = 
{Xi,..., Xn} sampled from P, raising the question of the approximation of the DTM. A natural 
idea to estimate the DTM from is to plug the empirical measure Pn instead of P in the dehnition 
of the DTM. This “plug-in strategy" corresponds to computing the distance to the empirical measure 
(DTEM). It can be applied with other estimators of the measure P, for instance in Caillerie et al 


(2011) it was proposed to plug a deconvolved measure into the DTM. 


For m = the DTEM satishes 

n ’ 


d 


Pn.,k/n,r 


X := - 


E 

1=1 


X — Xn 


1 ( 1 ) 


where ||x —X„||(j) denotes the distance between x and its j-th neighbor in {Xi,... ,Xn}. This quantity 
can be easily computed in practice since it only requires the distances between x and the sample points. 
Let us introduce 


.(x) . dp„ 2 ,r{x) 


(3) 


and 


^n,m,r{x') .— dp^^r,m{x') dp^m,r{x) ■ 


The aim of this paper is to study the deviations and the rate of convergence of A„^m,r(x). The 


functional convergence of the DTEM has been studied recently in Chazal et al. (2014a) where it is 
shown that the parametric convergence rate in l/\/n is achieved under reasonable assumptions. In 
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this paper we address the question of the convergence in probability and the rate of convergence in 
expectation of An^m,rix), both from an asymptotic and non asymptotic point perspective. 

The stability properties of DTM with respect to Wasserstein metrics suggests that this problem 
could be addressed using known results about the convergence of empirical measure to P under 
Wasserstein metrics. This last problem has been the subject of many works in the past (del Barrio 


et al.| 

1999 

et al. 

2013 


2005 


Rachev and Riischendorf 19981 and it is still an active held of research (Dereich 


Fournier and Guillin 2013). Contrary to the context of TDA with the standard distance 


function, where stability result provide optimal rates of convergence (see Chazal et al. (2015)), we show 
in the paper that Wasserstein stability does not lead to optimal results for the DTM. Moreover, such 
a basic approach does not provide a correct understanding of the inhuence of the parameter m (see 
Appendix . 

We adopt an alternative approach based on the observation that the DTM only depends on a push 
forward measure of P on the real line. Indeed, the DTM can be rewritten as follows: 


1 


dp,mA^) = — 

’ ’ m 


K 


- 1 / 


where 
\\x — ■ 


x,r 


x,riu)du, (4) 

is the quantile function of the push forward probability measure of P by the function 


'0 


(see appendix B.l for a rigorous proof). Then we have 


.(x) 


1 




-1 

x.r.n 


A) - du, 


(5) 


where Fx,r,n is the empirical distribution function of the observed distances (to the power r): ||x — ||^, 

..., IIX — Ai„||^. We study the convergence of An,m,rix) to zero with both an asymptotic and non 
asymptotic points of view. An asymptotic approach means that we take k = kn ■= mn for some hxed 
m and we study the mean rate of convergence to zero of A ^ (x). A non asymptotic approach means 

’ n ’ 

that n is hxed and then the problem is to get a tight expectation bound on k ^(x). In particular, 

’ n ’ 

we are particularly interested in the situation where ^ is chosen very close to zero. This situation is 
of primary interest since it corresponds to the realistic situation where we use the DTM to clean the 
support from a small proportion of outliers. 

Our results rely on a local analysis of the empirical process to compute tight deviation bounds of 
A^ k j.(x). More precisely, we use a sharp control of a supremum dehned on the uniform empirical 

’ n ’ 

process. Such local analysis has been successfully applied in the literature about non asymptotic 
statistics, for instance Mammen et al. (1999) obtain fast rates of convergence in classihcation. For 
a more general presentation of these ideas in model selection, see Massart (2007) and in particular 
Section 1.2 in the Introduction of this monograph. 

We show that the rate of convergence of A^ k ^{x) directly depends on the regularity at zero of 

’ n ’ 

This quantile function appears to be the relevant quantity to describe precisely how difficult is 
a geometric inference problem. The second contribution of this paper is relating the regularity of the 
quantile function F~l to the geometry of the support, establishing a link between the complexity of 
the geometric problem and a purely probabilistic quantity. 

Our main results, the deviations bounds and the rate of convergence of A fc (x) derived from the 
local analysis, are given in Section These results are given in terms of the regularity of the quantile 
function Fp^- Generally speaking, it is not easy to determine what is the regularity of the quantile 
function F~^ given a distribution P and an observation point x G Indeed, it depends on the shape 
of the support of P, on the way the measure P is distributed on its support and on the position of x 
with regards to the support of P. This is why, in the results given in Section the assumptions are 
made directly on the quantile functions Section]^ is then devoted to the geometric interpretation 
of these results and their assumptions. In Section several numerical experiments illustrate the 
convergence of the DTEM and also conhrm that our bounds are sharp. Rates of convergence derived 
from stability results of the DTM are presented in Appendix [A| Proofs and background about empirical 
processes and quantiles can be found in the appendices also. 
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Notation. Let a A 6 and a V 6 denotes the minimum and the maximum between two real numbers a 
and b. The Euclidean norm on is || • ||. The open Euclidean ball of center x and radius t is denoted 
by B{x,t)- For some point x and a compact set K in the distance between x and K is defined by 
\\K — x|| := vaiy^K \\y ~ 3; ||. The Hausdorff distance between two compact sets K and K' is denoted 
by Haus (iL, iL'). A probability distribution on M defined by a distribution function F is denoted by 
dF. The quantile function F~^ of dF is defined by 

F~^{u) := inf{f G M, F{t) > u}, 0 < m < 1. 

By monotonicity, the quantile function F~^ can be extended in 0 and at 1 by setting T“^(0) = inf{t G 
M, F{t) > 0}, and = sup{t G M, F{t) < 1}. Finally, for two positive sequences (a„) and {bn), 

we use the standard notation < bn if there exists a positive constant C such that an < Cbn- 

2 Main results 

We fix r > 1 and we henceforth write Fx for Fx,r to facilitate the reading. In the same way we will use 
the notation Ap^ni, dp^rn since there is no ambiguity on the power term r. 

Given an observation point x G M'^, we introduce the modulus of continuity ujx of (possibly 
infinite) which is defined for any v G (0,1] by 

u:x{v)-.= sup \F-^{u) - F-^{u')\. 

||u— 

Note that the fact that ujx is finite is equivalent to the fact that the support of P is bounded. An 
extensive discussion about the relation between the measure P and the modulus of continuity of F~^ 
is proposed in Section The function Cox being non decreasing and non negative, it has a non negative 
limit £l;a;(0^) at zero. In particular we do not assume here that w^CO"’') = 0. In other terms we do not 
assume that is continuous. We extend Gjx at zero by taking Cjx{^) = ^a;(0^). 

In the following, it will be sufficient in our results to consider upper bounds on the modulus of 
continuity, that is a non negative function Ux on [0,1] such that ujx{v) > oj{v) for any v G [0,1]. A 
modulus of continuity being a non decreasing function, we will assume that such an upper bound uJx is 
non decreasing on [0,1]. For technical reasons and without loss of generality, we will also assume that 
ujx is a continuous function, which takes its values in [a;(0), a;(l)] C M"*". For such a function uJx we also 
introduce its inverse function which is defined on [a;(0),ti;(l)]. We extend this function to M'*' by 
taking = 0 for any t G [0,a;(0)] and = 1 for any t > a;(l). In particular, {uOx{u)) = u 

for any u G [0,1]. 

In this section, we show that the rate of convergence of A k{x) is of the order of [a ■ 

”-’n Vfc 

2.1 Local analysis of the distance to the empirical measure in the bounded case 

We first consider the behavior of the distance to the empirical measure when the observations Xi,, Xn 
are sampled from a distribution P with compact support in Let Fp^ be the quantile function of 
||x — Aill'’ and let k be defined by 

’ n 

Theorem 1. Let x be a fixed observation point in M'^. Assume that Ux '■ [0,1] —>■ M"*" is an upper 
bound on the modulus of eontinuity of Ffi^. Assume moreover that uJx is an increasing and continuous 
function on [0,1]. 
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1. For any X > 0, if k < ^ then 


P \A > A 



2 


( 2 f 

+ exp 

n ) -1 



otherwise 


Al 


< exp — 


kX"^ 


2 



2 + exp - 


+ exp ( ( k^^‘^ 


kX 


16F,-1(^) -F,-i(0) 


+ exp - 


+ exp 


1 


A f 2^/k 


o I 

2 \ n 




:=n(A), (6) 


< exp —2nA^ 


+ exp —2n < uj. 





+ exp - 




A 


Furthermore, in all cases we have P k{x)\ > A^ = 0 for any X > uJx{P)- 
2. Assume moreover that u}x{u)/u is a non increasing function, then for any k G {1,... ,n}: 

'k' 


E (^|A^ |(x)|) 


< 


< 



n 



where C is an absolute constant. 

The proof of the Theorem is based on a particular decomposition of A fe (x), see Lemma 


(7) 

( 8 ) 

in 


Appendix B.l This decomposition allows us to consider the deviations of the empirical process raTher 
than the deviations of the quantile process. The proof is given in Appendix [Bj 

Let us now comment on the final bound on expectation ([8|). This bound can be rewritten as follows: 


E 




^ n 1 


k (k 

U /~ M ~ 

k Pn V n \n 


(9) 


The term ^ comes from the definition of the DTM, it is the renormalization by the mass proportion 


The term corresponds to a classical parametric rate of convergence. The term y ^ is obtained 
thanks to a local analysis of the empirical process. More precisely, it derives from a sharp control of 
the variance of the supremum over the uniform empirical process. The term ojx corresponds to 
the statistical complexity of the problem, expressed in terms of the regularity of the quantile function 

^ X ■ 

Theorem can be interpreted with either an asymptotic or a non asymptotic point of view. Taking 
a non asymptotic approach, we consider n as fixed. A first result here is that we obtain sharp upper 
bounds for small values of In the most favorable case where Cjx{u) ~ u, we see in (|^ that an upper 
bound of the order of ^ is reached. This is direct consequence of the local analysis we use to control 
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the empirical process in the neighborhood of the origin. As mentioned before, assuming that ^ is very 
small corresponds to the realistic situation where we use the DTM to clean the support from a small 
proportion of outliers. 

Now, taking an asymptotic approach, a second result of Theorem is that it allows us to consider 
the asymptotic behavior of k (x) under all possible regimes, that is for all sequences (/c,i)neN- For 

’ n 

instance, with the classical approach where kn is such that kn/n = m for some fixed value m G (0,1), 
we then obtain the parametric rate of convergence lly/n, as in the asymptotic functional results given 
in 


Chazal et al. (2014a I. 

Another key fact about Theorem]^ is that the upper bound (|^ depends on the regularity of 


-1 

X 


through the function 


: m G (0,1) 


ujxim) 


m 


Moreover, if ti;(0'’“) = 0, we see that the upper bound ([^ depends on the regularity of F~^ only at 0 
for n large enough. For instance, if kn is such that kn/n = m for some fixed value m G (0,1) such that 
FF^{m) > ir“^(0), coming back to (Q, we find that for n large enough: 


wr I = wx ) < -Fx ^ M - f; ^(0). 


In 


this context, the right hand term of Inequality ([^ is of the order of where 


T IN F'-\m)-F-\0) 

Tx:mG(0,1) i-G 


m 


We now give additional remarks about Theorem 

Remark 1. If the quantile function Fff^ is rj-Holder, then iOx{u) = Au^ for some constant A > 0 and 
thus we have 






Remember that Holder functions with power rj > 1 are constants, we can thus assume that rj <1. 

Remark 2. Assuming that ujx{u)/u is a non increasing function roughly means that ujx is a concave 
function. Our result is thus satisfied if we can find an concave function which is an upper bound on the 


modulus of continuity of the quantile function. We show in Section 3.4 that it is satisfied for a large 
class of measures. 


Remark 3. For values of - not close to zero, the rate is consistent with the upper bound ( |15[ ) deduced 
from the approach based on the stability results (see Appendix^^. However, Theorem^is more satis¬ 
factory since it describes the statistical complexity of the problem through the regularity of the quantile 
function. 

Remark 4. The application u i— >■ is 1/r- Holder on with Holder constant 1 since 1/r < 1. It 

yields: 

|A„i,(rc)|<|A„.,(x)|‘G (10) 

’ n ’ ’ n ’ 

where k ^(x) is defined by (Q. We deduce an expectation bound on A^ k ^(x) from Jensen’s In¬ 


equality and Inequality (10); 



\ ^ 

1 

'k' 

-1/2 

fk\ 

(|A (x)| 

) ^ 

^/n 

n 

Wx 

Vn)_ 
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Remark 5. ^45 already mentioned before, to prove Theorem^ we consider the deviations of the em¬ 
pirical process rather than the deviations of the quantile process. Indeed, the more direct approach that 
consists in directly controlling the deviations of the quantile process gives slower rates. More precisely, 
using Proposition^ given in Appendix^^ borrowed from Shorack and Wellner (2009), it can be shown 
that 

wi)< 


E 






For instance, if ujx{u) = Au^, we obtain E k 
in Remark\^ 



which is slower than the rate given 


To complete the results of Theorem we give below a lower bound using Le Cam’s lemma (see 
Lemmain Appendix]^. Let a; be a continuous and increasing function on [0,1] and let x G M'’*. We 
introduce that class of probability measures: 

'Pui ■= is a probability measure on such that uj{u) > djx(u) for any u G [0,1]| . 

In the previous dehnition, the function oj is as before the modulus of continuity of the quantile function 
of the distribution of the push-forward measure of P by the function y i—)• ||y — x||^. 

Proposition 1. Assume that there exists P G Voj, c > 0 and u G (0,1), such that 

c [Fx^{u) — F“^(0)] > uj{u) for any u G (0, u]. (11) 

Then, there exits a constant C which only depends on c, such that for any k < un. 


sup E 
P&Vu, 



> 


inf sup E 

d„{x) P^Vu: 



n 1 

> C-r—UJ 
k n 


k-l 


n 


d 


P^m,r 


(x) 


where the infimum is than over all the estimator dn{x) of dp^rn,r{x) defined from a sample Xi,... ,Xn 
of distribution P. 

The Assumption ( |11[ ) is not very strong. It means that uj is not a too large upper bound on the 
modulii of continuity of the quantile functions. More precisely, it says that there exists a distribution 
P G Voj for which uj can be comparable to the modulus of continuity of the quantile functions F~^ in 
the neighborhood of the origin. 

Note that this lower bound matches with the upper bound of Theorem when k is very small since 
it is of the order of uo (^). Providing the correct lower bound for all values of k is not obvious. As far 
as we know there is no standard method in the literature for computing lower bounds for this kind of 
functional and we consider that this issue is beyond the scope of this paper. 


2.2 Local analysis of the distance to the empirical measure in the unbounded case 

The previous results provide a description of the fluctuations and mean rates of convergence of the 
empirical distance to measure. However, when the support of P is not bounded, the quantile function 
Ff^ tends to inhnity at 1 and the modulus of continuity of Fff^ is not hnite. In such a situation. 
Theorem can not be applied. We now propose a second result about the fluctuations of the DTEM, 
under weaker assumptions on the regularity of Ff^. The following result shows that under a weak 
moment assumption, the rate of convergence is the same as for the bounded case, up to a term 
decreasing exponentially fast to zero. 
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Theorem 2. Let m G (0, 1) and some observation point x G Assume that iOx,m is an upper bound 
of the modulus of continuity of F~^ on (0, m]: for any u, u' G [0, fh)^ , 

\F-\u) - F-\u')\ < -u'\). (12) 

Assume moreover that Ux^m is increasing and continuous function on [0,m]. Then, for any k < 
n (2 A m) and any A > 0; 


p(|A^^^(x)|>a) 

2 


< □(A) + exp 
+ exp 

+ < 2 exp 




n—k-\-l' 


(13) 


where □(A) is the upper bound given in Theorem^ with ojx replaced by uix,m- Assume moreover that 
^x,fh{u)/u is a non increasing function and that P has a moment of order r. Then 


E 







Fx\^) 


H” 


x,m 


y/k 


n 


> ~\~C X 


exp 



where C is an absolute constant and Cx,r,fh only depends on the quantity E||X — x||^ and on m. 

As for the bounded case, if w(0'’') = 0 and if P“^(m) > ^^7^(0), then the rate of convergence is 
still of the order of . Note that this result is interesting even when the measure P is supported 

on a compact set. Indeed, assume that the quantile function Ff^ is not continuous, then d;“^(0) > 0. 
However, if Ff^ is smooth in the neighborhood of zero, for m small enough the assumption ( |12[ ) may 
be satisfied with a function 0Jx,rh which can be very small in the neighborhood of zero. Theorem]^ may 
provide better bounds in this context than those given by Theorem This fact also confirms that 
the deviations of the DTEM mainly relies on the local regularity of the quantile function Fff^ at the 
origin rather then on its global regularity. 


2.3 Convergence of the distance to the empirical measure for the sup norm 

The previous results address the pointwise fluctuations of the DTEM. We now consider the same 
problem for the sup norm metric on a compact domain P of W^. Let N{P,t) be the covering number 
of P, that is the smallest number of balls B{xi,t) with Xi G P, such that {J^B{xi,t) D P. Since the 
domain P is compact, there exists two positive constants c and u < d such that for any t > 0 : 

N{P,t) < VI. 

We assume that there exists a function ujti ■ (0,1] — )• M"*" which uniformly upper bounds the modulus 
of continuity of the quantile functions {Ff^)x£T>- for any u,u' G (0, 1]^ and for any x ^P: 

\FxHu) - - u'\). 

We also assume as before that ujx> is an increasing and continuous function on [0,1]. 
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Theorem 3. Under the previous assumptions, for any /c < S, 


E sup|A fc(x)| < 


,x£V 


C_ 

n 


< 


C_ 

n 


k 

n 

+ 

k' 

n 


- 1/2 


c 


P-1 


n 


log’' 


k 




n 


(i)--f’- («)]' 

n i'-In 
y/k 


J/+5' 






- 1/2 


WD - log^ 

V ^ / 




i-FU\0) 


1 2u+5 


A 


U!x> 


y/fc 


1 u-1 


where log'''(u) = (logw) Vl for any u G M"*" . The constant C is an absolute constant if r = 1 otherwise 
it depends on r and on the Hausdorff distance between T) and the support of P. 


This bound is deduced from a deviation bound on sup^-g^? * (®)l which is given in the proof. 

’ n 

Up to a logarithm term, the rate is the same as for the pointwise convergence. As for the pointwise 
convergence, this result could be easily extended to the case of non compactly supported measures. 


3 The geometric information carried by the quantile function F 


-1 

X 


The upper bounds we obtain in the previous section directly depend on the regularity of F~^. We now 
give some insights about how the geometry of the support of the measure in impacts the quantile 
function F~^. 

3.1 Compact support and modulus of continuity of the quantile function 

A geometric characterization of the existence of Ox on [0,1] can be given in terms of the support of 


the measure P. The following Lemma is borrowed and adapted from Proposition A. 12 in Bobkov and 
Ledoux (2014): 

Lemma 1. Given a measure P in and an observation point x G the following properties are 
equivalent: 

1. the modulus of continuity of the quantile function F~^ satisfies uJx{u) < oo for any u <1 ; 

2. the push-forward distribution of P by the function ||x — -H^ is compactly supported ; 

3. P is compactly supported. 

In particular, if P is compactly supported, we can always take as an upper bound on Ox the 
constant function ojx = Hans ({a:}, K). Of course this is not a very relevant choice to describe the rate 
of convergence of the DTEM. 

3.2 Connexity of the support and modulus of continuity of the quantile function 

While discontinuity of the distribution function corresponds to atoms, discontinuity points of the 
quantile function corresponds to area with empty mass in (see the right picture of Figure [^. The 
fact that ti;x(0'’') = 0 is directly related to the connectedness of the support of the distribution dFx- 
Indeed, it is equivalent to assuming that the support of dFx is a closed interval in M"*", see for instance 


Proposition A.7 in Bobkov and Ledoux (2014). 


In the most favorable situations where the support of P is a connected set, then = 0 and 

the faster Ox tends to 0 at 0, the better the rate we obtain. However, for some point x G it is also 
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Figure 1: Left: one situation where the support of P is not a connected set whereas the support of 
dFx is (for r = 1). The quantile function is continuous. Right: one situation where the support 
of dFx is is not a connected set ; the quantile function F~^ is not continuous. 
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possible for the support of dFx to be an interval even when the support of P is not a connected set of 
(see the left picture of Figure]^. In the other case, when the support of dFx is not a connected set, 
the term a;a;(0) roughly corresponds to the maximum distance between two consecutive intervals of the 
support of dFx (see the right picture of Figure Q. Our results can still be applied in these situations 
but the upper bounds we obtain in this case are larger because Wx(^) can not be smaller than ojx{0)- 


3.3 Uniform modulus of continuity of versus local continuity of at the 
origin 

Though stronger than continuity, a natural regularity assumption on F^r is assuming that this function 
is also concave: 


Lemma 2. If F^ ^ is concave then we can take cOx = F^ ^ — F^ ^(0). In partieular, if x is in the support 
of P then we can take lOx = F^^. 


If we take r = 1, in many simple situations we note that the cumulative distribution function Fx^i 
roughly behaves as a power function t^, where i is the dimension of the support. In this context, the 
quantile function F~l roughly behaves as a power function in We then have that F~l{u) = 


1 * V 

F“i (u) behaves as u~i. This is for instance the case for (a, 6) standard measures, as shown in the 
next section. These considerations suggest that if rfl < 1, in many situations the quantile function is 
concave and then ujx is of the order of F~^ — T“^(0). This means that the upper bound on E| Ap^ k \ 

’ ’ 71 

is of the order of 

y/n 

More generally, as noticed in the comments following Theorem 0 the term F^ ^(|) — Tj, ^(0) is the 
dominating term in the upper bound ([^ . We may check with the numerical experiments of Section 
that the function ^x yet captures the correct monotonicity of E|Ap^ fe | as a function of 


3.4 The case of (a, b) standard measures 

The intrinsic dimensionality of a given measure in can be quantified by the so-called (a, h)- standard 
assumption which assumes that there exists a' > 0, po > 0 and 6 > 0 such that 

Vx G iF, Vr G (0,po)) P{B{x,p)) > afp^, 


where K is the support of P. This assumption is popular in the literature about set estimation (see 
for instance Cuevas 2009} Cuevas and Rodriguez-Casal 2004). More recently, it has also been in used 


m 


Chazal et al. (2014b, 2015); Fasy et al. (2014) for statistical analysis inTopological Data Analysis. 


Since K is compact, by reducing the constant a' to a smaller constant a if necessary, we easily 
check that this assumption (3.4) is equivalent to 

Vx G K, P{B{x, p)) > 1 A ap^. 

We now give control on the two key terms Ux and F~^{u) — ^“^(0) which are involved in the 
bounds on expectations of Section 

Lemma 3. Let P be a probability measure on whieh is (a, b) standard on its support K. Then, for 
any u G [0,1], 


/ X ^-1/ X /tt\ 1/^ r/tt\ 1/^ 

F- (u) - F- (0) < r (-) (-) + 


\K — x| 


r—1 


where r is the power parameter in the definition (|^ of the DTM. Assume moreover that K is a 
connected set o/M'^. Then, for any h G (0,1) we have 


h\ 


i/b 


bbx{h) < r ( — j Haus ({x}, K) 


r—1 
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Figure 2: About the modulus of continuity of the quantile function ^ in the case of (a, b)- standard 
measures in 


Proof. We have (see the left picture of Figure]^ 

> P (B ( TTKix), i — \\K — X 


where'7ric(a;) is a point of which satishes II AT—x|| = || 7rii:(x)—x||. ThenAi;(t)>a — \\K — x\ 


1 b 


and we hnd that F^ ^(u) < 


+ ||iF — x\ 


and the hrst 


Next, we have F^ ^(0) = ||iF — x|| 

point derives by upper bounding the derivatives ofni—)• [n + ||iF — x||]^. 

We now assume that K is a connected set. Let [u, h) G (0,1)^ such that u + h < 1 and 
Fff^iu) > F“^(0). We can also assume that + h) > Let a{h) = + hff-P — 

[r’-'wr'’- (see the right picture of Figure 0. By dehnition of a quantile, there exists a point 
xi ^ Kr\{B (x, [F-^{u + h)YP) \ B {x, [F-i(n)]i/’')}. If F-^{u) > 0 then for the same reason 
there exists a point X 2 € K DB [x, [F“^(?x)]^/^). If Ff^iu) = 0 then x & K and we take X 2 = x. Next, 
since K is a connected set, there exists a point X 3 G iLn B (x, + ^). The measure P being 

(a, 6 )-standard, we hnd that 


h> P [ B [ X 3 , 


> a 


a{h) 


a{h) 


Then, 

and thus 
which proves the Lemma 


F-\t + h) - F-\t) < ra-'"’ (f7‘(t + h'!’’, 


□ 
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One sample of 500 observations in R 


One sample for the Clutter noise model 



Figure 3: Left: samples drawn for each generative model for the Segment Experiment. Right: one 
sample drawn from the clutter noise model for the 2-d Shape Experiment. The observation point is 
represented by a blue cross. 


4 Numerical experiments 


In this section, we illustrate with numerical experiments that the expectation bounds given on k 

n ~ ^ ^ 

in Section]^ are sharp. In particular, we check that the function 'I'a; has the same monotonicity as the 
function m i—>■ IE|A^ k{x)\ ■ 

’ n 

We consider four different geometric shapes in M, and for which a visualization is possible: 
see Figures and 

• Segment Experiment in M. The shape K is the segment [0,1] in M. 

• 2-d shape Experiment in A closed curve has been drawn at hand in It has been 
next approximated by a polygonal curve with a high precision. The shape K is the compact set 
delimited by the polygon curve. 

• Fish Experiment: a 2-d surface in The shape K is the discrete set defined by a point 
cloud of 216979 points approximating a 2-d surface representing a hsh. This dataset is provided 
courtesy of CNR-IMATI by the AIM@SHAPE-VISIONAIR Shape Repository. 

• Tangle Cube Experiment in The shape K is the tangle cube, that is the 3-d manifold 
dehned as the set of points (xi, X 2 , X 3 ) G such that x\ — Sx^ + x^ — Sx^ -|- X 3 — 6 x 3 -|- 10 < 0. 


For each shape, we consider three generative models. These models are standard in support esti¬ 
mation and geometric inference, see Genovese et al. (20121 for instance. 


• Noiseless model: Xi,... are sampled from the uniform probability distribution Punt on K. 


• Clutter noise model: Xi,... Xn are sampled from the mixture distribution P^i = TrU+{l—7r)P 
where U is the uniform measure on a box B which contains K and where vr is a proportion 
parameter. 
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3 



Figure 4: Left: 3-d plot of the shape for the Fish Experiment. Right: a 3-d plot of a sample drawn 
for the uniform measure on the Tangle Cube. The observation point is represented by the blue point 
outside of the shape. 


• Gaussian convolution model: Xi,... Xn are sampled from the distribution Pg = P*4>(0, aid) 
where $(0, a) is the centered isotropic multivariate Gaussian distribution on with covariance 
matrix aid- We take cr = 0.5 in all the experiments. 


We use the same notation Pn for any of the probability distributions Puni, Pci or Pg. An observation 
point X is fixed for each experiment. For each experiment and each generative model, from a very 
large sample drawn from Pn we compute very accurate estimations of the quantile functions Pp^ and 
of the DTM dp^^rn,ti^)- Next, we simulate n-samples from Pn and we compute the DTEM for each 
sample. We take n = 500 for the two first experiments and n = 2000 for the two others. The trials 
are all repeated 100 times and finally we compute some approximations of the error lEA^ k ^{x) with a 

’ n ’ 

standard Monte-Carlo procedure, for all the measures Pg. The DTMs and the DTEMs are computed 
for the powers r = 1, r = 2 , and also for r = 3 for the Tangle Cube Experime nt. We also compute 
the function m i—?• T(m). The simulations have been performed using R software ( jR Core Team 20141 
and we have used the packages FNN, RGL, GRImport and SP. 


Results 

The figures to give the results of the four experiments with the three generative models. The top 
graphics of Figures 5 to 8 represent the quantiles functions F~l in each case. For the noiseless models, 
the behavior of at the origin is directly related to the power r and to the intrinsic dimension of 
the shape. For r = 1 , the quantile is linear for the the segment, it is roughly in ^/rn for the 2-d shape 
and for the Fish Experiment. It is of order of m}P for the Tangle Cube. We observe that F~l is 
roughly linear with r = 2 for the 2-d shape and the Fish shape, and with r = 3 for the Tangle Cube. 

The quantile functions of the noise models in the four cases start from zero since the observation 
is always taken inside the supports of Pd and Pg. A regularity break for the quantile function of the 
clutter noise model can be observed in the neighborhood of m = P{B{x, \\K — x||^)). The quantile 
functions for the Gaussian noise is always smoother. 

The main point of these experiments is that, in all cases, the function m i—'l'(m) shows the same 
monotonicity as the expected error studied in the paper : m i—)• |EA„^m,r(a^)| ■ These results confirm 
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that the function 'h provides a correct description of EA„_m,r- 

We also observe that the function : m i—>■ E|A„^m,r(3;)| does not have one typical shape : it can be 
an increasing curve, a decreasing curve or even an U-shape curve. Indeed, the monotonicity depend 
on many factors including the intrinsic dimension of the shape, its geometry, the presence of noise and 
the power coefficient r. 


5 Conclusion 


When the data is corrupted by noise, the distance to measure is one clue for performing robust geometric 
inference. For instance it can be used for support estimation and for topological data analysis using 


persistence diagrams, as proposed in Chazal et al. (2014a). In practice, a “plug-in" approach is adopted 


by replacing the measure by its empirical counterpart in the definition of the DTM. The main result 
of this paper is providing sharp non asymptotic bounds on the deviations of the DTEM. 


The DTM has been recently extended to the context of metric spaces in Buchet et al. (2015b |. For 
the sake of simplicity, we have assumed that P is a probability measure in M'^. However, all the results 
of the paper can be easily adapted to more general metric spaces by considering the push forward 
distribution of P by d{x, -Y where d is the metric in the sampling space. 

This paper is a step toward a complete theory about robust geometric inference. Our results give 
preliminary insights about how tuning the parameter m in the DTEM, which is a difficult question. The 
experiments proposed in Section show that the term 'EAn,m,r{x) does not have a typical monotonic 
behavior with regard to m and thus classical model selection methods can be hardly applied to this 
problem. We intend to study this non standard model selection problem in future works. 


A Rates of convergence derived from the DTM stability 

The DTM satisfies several stability properties for the Wasserstein metrics. In this section, rates of 
convergence of the DTEM are derived from stability results of the DTM together with known results 
about the convergence of the empirical measure under Wasserstein metrics. We check that the results 
derived in this way are not as tight as the results given in Section 

Let us first remind the definition of the Wasserstein metrics in For r > 1, the Wasserstein 
distance Wr between two probability measures P and P on is given by 


Wr{P,P)= inf 

7ren(p,p) 


\x - yY-K{dx,dy) 


where n(P, P) is the set of probability measures on x with marginal distributions P and P, see 


for instance Rachev and Riischendorf (1998) or Villani (2008). 


The stability of the DTM with respect to the Wasserstein distance Wr is given by the following 
theorem. 


Theorem 4 (Chazal et al. (2011b)). Let P and P be two probability measures on M'^. For any r > 1 
and any m G (0,1) we have 

IMp,m,r A XTl ^WriPjP')- 


Notice that Chazal et al. (2011b) prove this theorem for r = 2, but the proof for any r > 1 is 
exactly the same. 

We now give the pointwise stability of the DTM with respect to the Kantorovich distance Wi 
between push forward measures on M. This result easily derives from the expression (Q of the DTM 
given in Introduction, a rigorous proof is given in Appendiri|B.l[ 
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Figure 5: Quantiles functions (top), expected error EA^ ^ ^(x) (middle) and theoretical upper 
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bounds 'h (bottom) with powers r = 1 (left) and r = 2 (right), for the Segment Experiment. 
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Figure 6: Quantiles functions (top), expected error EA^ ^ ^(x) (middle) and theoretical upper 
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bounds 'h (bottom) with powers r = 1 (left) and r = 2 (right), for the 2-d Shape Experiment. 


17 




















100 150 200 250 300 


F 


-1 

X 


F 


-1 

X 






o 

in ~ 
CN 



o 

o 

o 

CM 



o 

o 

in 


_ 

o 

o 

o ~ 


F 

o 


- Noiseless 

in 

y - Noiseless 

Clutter noise 


f Clutter noise 

Gaussian noise 

o — 

Gaussian noise 


0.00 0.05 0.10 0.15 0.20 0.25 0.30 


0.00 0.05 0.10 0.15 0.20 0.25 0.30 


m=k/n 


m=k/n 


E(An) 


E(An) 




m=k/n 


m=k/n 








m=k/n 


m=k/n 


Figure 7: Quantiles functions (top), expected error EA^ ^ ^(x) (middle) and theoretical upper 
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bounds 'h (bottom) with powers r = 1 (left) and r = 2 (right), for the Fish Experiment. 
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Proposition 2. For some point x in and some real number r > 1, let dFx^r CL^d dFx^r be the 
push-forward measures by the funetion y i—>■ ||x — y||^ of two probability measures P and P defined on 
Then, for any x G 




P^m,r 


(x) < —Wi{dFx,r,dFx^r) 
m 


Convergence results for An,m,r can be directly derived from the stability results given in Theorem 
and Proposition]^ For instance, it can be easily checked that, for any x G Wi{dFx,r,dFn,x) tends 
to zero almost surely (see for instance the Introduction Section of [del Barrio et aT 19991. This together 
with Proposition gives the almost surely pointwise convergence to zero of An,m,r{x). 


Regarding the convergence in expectation, using Theorem in 
Fournier and Guillin| (|2M3 1 or from Dereich et al. (20131 that 


for d > r l‘l, we deduce from 


E||A, 


< - 


— 1/r 


¥.Wr{P,Pn 


< - 


< 


— 1/r 


c[^- 

n 


[¥Mf:{P,Pn)f^ 

-Ijr 


n 


-1/d 


Nevertheless this upper bound is not sharp: assume that '■= rnn for some fixed constant m G (0,1) 
then the rate is of the order of We show below that the parametric rate l/\/n can be obtained 

by considering the alternative stability result given in Proposition In the one-dimensional case, a 


direct application of Fubini’s theorem gives that (see for instance Theorem 3.2 in Bobkov and Ledoux 


20141 


^/rM \Wl{dFx,r, dFx,r,n)] < 




Fx,r{t){l - Fx,r{t))dt =: Jl{dFx,r), 


(14) 


where dFx^r and dFx^r,n are the push forward probability measures of P and by the function ||x — • 


Note that Bobkov and Ledoux] ( |2014| have completely characterized the convergence of /in) 

in the one-dimensional case, in term of Jiip.) for y a probability measure on the real line and its 
empirical counterpart. From Proposition and the upper bound (14) we derive that 


E 




^ n J\{dFx^r) 


n 


(15) 


The integral Ji{dFx^r) is finite if E||X — < oo for some <5 > 0. We thus obtain a pointwise 

rate of convergence of l/\/n under reasonable moment conditions, if we take kn ■= mn for some fixed 


constant m G (0,1). However, the upper bound (15) does not allow us to describe correctly how the 
rate depends on the parameter m = K For instance, if ^ is very small, the bound blows up in all cases 
while it should not be the case for instance with discrete measures. The reason is that the stability 
results are too global to provide a sharp expectation bound for small values of -. 


B Proofs 

B.l Preliminary results for the DTM 

Rewritting the DTM in terms of quantile function 

Let P a probability distribution in M'^, x G and r > 1. Let Fx^r be the distribution function of the 
random variable ||x — X\f, where the distribution of the random variable X is P. The preliminary 
distance function to P 

Sp^u : X G !-)• inf{t > 0 ; P{B{x, t)) > tt} 
can be rewritten in terms of the quantile function Fx^r- 
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Lemma 4. For any u G (0,1), we have hp^ix) = F^ l{u). In particular, Sp^u(x) = F^ i (■«)• 

Proof. Note that for any t G M"*“, Fx^r{t) = P{B[x,t^^^)). Next, 

{t > 0 ; P{B{x, > 4 = {s’'; s > 0 , P{B{x, s)) > £} 

and we deduce that 

= infjs'’; s > 0 , s)) > t’} 

= 6 pi{x). 

where we have used the continuity of s i—>■ s’' for the last equality. □ 

From Lemma we directly derive the expression of the DTM in terms of the quantile function 
F~x, as given by Equation (Q in the Introduction Section: 

1 /■’” 

^P,m,ri^) / Px,ri^)^^‘ 

m Jq 


Proof of Proposition 

Let F and F be the cdfs of two probability measures dF and dF on M. Recall that, for any r > 1, and 
any measure fi and jl in M: 


Wf{dF,dF) 



F-\u) - F-\u)fdu, 


see for instance see for instance Vallender (1974|. Thus 



F-\u)\du < Wf{F,F) 


and the proof follows using Equation Q. 


(16) 


A decomposition of k 

’ n ’ 

For any x G M'^, any r > 1 we have ,,(0) > FFf{0) > 0 since F^^r is the cdf of the random distance 
||x — X\f whose support is included in M"*“. From Equation ([^ and geometric considerations (see 
hgure we can rewrite An,m,r as given in the following Lemma. 

Lemma 5. The quantity k ^ can he rewritten as follows: 

’ n ’ 



{^x,n,ri^) ^x,r (^)} 


n / ^ 

kJo 

n rFTl{.'f)yF-Xr{i) 

k 


Jp-fio) 


k k 

Fx,r{t) Fx,n,r{t) 
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Figure 9: Calculation of k ^{x) by integrating the grey domain horizontally or vertically. 

’ r). ’ 


B.2 Proof of Theorem [T] 

We recall that we use the notation F for and F^ for Fx^n,r in the proof. 


Upper bound on the fluctuations of k{x) 

’ n 

We first check that P ^|A^ fc (3;)| > = 0 for A > Note that a;a;(l) < oo because the support 

of P is compact. Let and G“^ be the empirical uniform distribution function and the empirical 
uniform quantile function (see Appendix [C|) . Starting from the definition ([^ of the DTM and using 
Proposition in Appendix [C| we obtain that for A > 0 and k < n: 


P 



\F-^{G-\u))-F-\u)\>X 


ivx (|G„^(u) - u|) > A 




(u) -u\>UJx (A) 


and this probability is obviously zero for any A > Wx(l)- 

We now prove the deviation bounds starting from Lemma If F' 

n n (k 1 

A^k{x) = - {F{t) - Fn{t)} dt+- - - F„(t) dt 

k ^ [n J 

and thus 


A^ 


(x) 


n 

< - 
- k 




n 


\F{t)-Fn{t)\dt+- 






B 


IfF-i(L)>i7-i(L),then 


A / N n 




‘(I) n 


F(t) --}dt 

n ' 
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and thus 


fe(x) 


< T / |F(t)-F4t)|dt+- / --F(t) dt 

kJF-i(o) kJF-i^k) [n } 

n n 

< T \F{t)-Fr,{t)\dt+- {Fr,{t)-F{t)}dt 

k Jf-^{o) k Jf-\^) 


< 


/f- 1 ( 0 ) 


\Fit)-Fn{t)\dt. 


In all cases, Inequality 0 is thus satisfied. 


• Local analysis : deviation bound of k{x) for ^ close to zero. We now prove the deviation 

bound for ^ < 5 - We first upper bound the term A in (17). According to Proposition |^in Appendix 
for any uq G (0, and any A > 0: 


nA^(l — uq )'" 


1 


P sup |Gn(n) — m| > A < 2exp —- , . 


For uo < I and A > 0 it yields 


3uo 


(18) 


P sup \Fn{t) - F{t)\ > X 

Vte[F-i(0),F-i(no)) y 


= P I sup |G,i(w) — tt| > A 
\ue[o,Mo) > 


< 2 exp — 


nA^(l — uo)^ 


1 


2 uo 


1 + 


(1 —mo)A 

3tto 


^ O I ( 3nA(l-tio) 

< 2exp|- — -j+2exp(- - - 


where we have used Proposition in Appendix for the first equality, (18) for the second inequality, 
and that for any n, u > 0, exp(—n/(l + u) < exp(—?x/2) + exp(—ri/(2u)). The term A can be upper 
bounded by controlling the supremum of \Fn — P| over in))' n ^ T yields 


F- 


P(A>A)<pj^^ 

nA^(l-l)^ 


■(A-F-(O) 




< 2 exp 


4 k 


k 

n 


F-l (k 




sup I F, 

J ie[F-i(o),F-i(A)) 

/ 3nA(l - 
+ 2 exp- 


k 

n 


P -1 


(l)-F-l(O) 


^ l n k A A 


(19) 


We now upper bound B. We have 


>^<-1 


F 


-1 


n 


--F 

^ n 

n 


F 


-1 


^F„-1(A)>F-i(|)- 


( 20 ) 


Thus, according to Proposition in Appendix [U} 


P(B>\) < P[- 


n 


p-M g-M - ] ] - P“^ 


n 


--G„(l 

n \n 


\ n / — n 


< P(Po>A) 
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where 


Bn : = 




Let 9 G (0,1) to be chosen further. We have 


‘^Bq < 0\l —ojx ( G, 


n 


--gJ^ 

n \n 


1 rn 


n \n 






Then we can write 

P{B > A) < P [Bi>^/X] +P [B 2 >^X 

.-1 (k 


< P 


g: 


k 

n J n 




n} n 




Thus, 


P{B > A) <2 exp 


n<oj 


-i V\ pk 


X \ e \ n 


+2 exp —- 


n 


Ak/n 

^ 0 ^ [a T 


^ / 

+ 2 exp 


\ 0 y n 


8 


/ 


V 




4k 


+ 2exp 


( 21 ) 


where we have used Propositions j^andj^ According to (17), we have P ^ k(x) > A^ < P(A 


> 


2 ) +P(B > 2 )- then obtain the following deviation bound from Inequalities (19) and (21) for any 
^ < 5 and any A > 0: 


p(|A„^(x)|>a) 


< exp — - 


n 


+ exp I -77: i co^ 


n 

Ik 


-1 


1 X k 
e\l 8 V n 


1,2 1 

1 + exp 

Y n \ 


+ exp ^ 

3n 



nO^XX 


3n k 


X 


1 IX fkW 


+ exp — 


0 V 2 V n 


( 39n X k\ 

+ —ViVn • 


where 6 will be chosen further in the proof. 

k 1 

lui 

Theorem]^, it gives that 


Deviation bound of A^ k{x) for 7 > 5 - For controlling A, we now use the DKW Inequality (see 


n 


PiA>X) < P(G TP-i 


^ l7 )-c-‘(0) 


sup |F„(t) - F(t)| > A 
1 ie[o,i] , 


< 2 exp —2nX^ 


k 

n 


i7-l(L)_i7-l(0) 
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We decompose B into Bi and B 2 as before. We use DKW again for Bi and For the quantile term 
Bi, note that 


sup > A > C < sup |G„^(n) — tt| > A > = < sup |Gn(f) — t| > A > . 

.e[o,|] I [«e[o,i] J [4e[o,i] 


We find that for any 0 > 0 : 

p(|A„^(x)|>a) 


< exp —2nA^ 


1 2' 


F-i(|)-F-i(0) 


+ exp —2n < CO, 


-1 


1 /A k 


6» V 2 V n 


k 

+ exp ( —nO'^X— 
n 


■ (23) 


where 6 will be chosen further in the proof. 


Upper bound on the expectation of k{x) 

’ n 

• Case ^ By integrating the probability in (|22[), we obtain 


E 



F-i(O) 


IQ fk 
3n I n 


-1 


F-Gi)-F-'(O) 


/■ / 3n 

d\+ exp — —u, 

Jx>o \ 


1 /A k 


-1 

^ ' 0 V 2 V n 


dX 


h 


8 32 1 M 


+ + 


6'^n 9 02^2 W ^ 


E (24) 


Since ujx{u)/u is a non increasing function, we have that is a non decreasing function. Then, 

for any positive constants Ai and A2: 


A + .^2 < Ai + 


/A>Ai 


exp 



1 -1 
\ 



dX 



A + .^2 < 4 





We then choose 
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to balance the terms Ii and in (24). The deviation bound given in the theorem corresponds to 
this choice for 6. 

Finally, note that Ux because ujx{u)/u is a non increasing function and we 

obtain that there exists an absolute constant C such that 


E 




< 


c 


- 1/2 


F 


-1 


-]-F-\0) 
n 


+ ^x 


n 


(25) 


Case - > h. We integrate the deviations 


and we obtain that 


E 

A k (x) 

< c 

1 


n, — v ^ 

’ n 


Vn 




+ 


n 


We then choose 


02 = 


VnuJx[-j^ 


+ 


1 n 
0^n k 


(26) 


The deviation bound given in the Theorem correspond to this choice for 9. Since < \/2 < 2, we 


see that the expectation bound (26) for this choice of 9 can be rewritten as the expectation bound 
(25). This concludes the proof of Theorem 

B.3 Proof of Proposition 


m-l 

-^0,r 



U 


Figure 10: The two quantile functions Tg ^ and F^ 




We first consider the case d = 1. For applying Le Cam’s Lemma (Lemma |^, we need to find two 
probabilities Pq and Pi which distances to measure are sufficiently far from each other. Without loss 
of generality we can assume that x = 0. Let P G Vuj which satisfies ( |11[ ). We can assume that P is 
supported on M'*' since the push forward measure of P by the norm is in Vuj and also satisfies 0- Let 
P“^ be the quantile function of P. For some n > 1, let Pq := P and let Pi := ^(5o + P 
where 6o is a Dirac distribution at zero and where P 


[a,b] 


. - l[0,F-i(l-l/n)]’ 

is the restriction of the measure P to the set 


[a, 6]. For i = 0,1, let Pi^r be the push-forward measure of Pi by the power function t ^ F on 


Let also Fi^r and F^^ be the distribution function and the quantile function of Pi^r, see Figure 
an illustration. Note that that Pi^r = + Po.rljg i 7 ’-i(i_i/„)]- Thus Pi is in because 


10 


M+. 

for 




^0:r< 


Pn-i(O) 


F'o,r (u-l/n) 


if w < ^, 

otherwise. 
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The probability measures Pq and Pi are absolutely continuous with respect to the measure ^ := ho + -f’- 
The density of Pq with respect to /r is po := l(o,+oo) whereas the density of Pi with respect to p is 
Pi = n^{0} + l(o,F-i(i-i/n)]- Thus, 


ri/(Po,Pi) = [ \piit) - poit)\ dp{t) 

Jr+ 



2 

n 


The next, [1 — rP(Po,Pi)]^” = (1 — —)> e ^ as n tends to infinity. Moreover, 

h. 

\dpo,ri^) - dp,,ri^)\ = ^ {^oTr (ll) “ d^i,riu)} du 


n 


-) 
n '' 

> - 

- k 


-l/n) 

Thus, 

)l > 

n 1 

Pf ^ ( 

11 — 

k n 

l,r \ 


n 1 

p-1 ( 

> 



k n 

0,r \ 

> 

n c 

^k\ 

-LJ - 


k n 

\n J 




-]-F0.r{^) 


where we have used Assumption (|11[) for the last inequality. We conclude using Le Cam’s Lemma. 

We now consider the case d> 2. Let P G P^ which satishes ( |11| ). By considering the push-forward 
measure of P by the function 

y '—^ (ll2/ll,o,...,o) ’ 

we see that it is aways possible to assume that there exist a probability P supported on M'*' x {0}'^“^ 
which satisfies Now, it is then possible to define Pq and Pi as in the case d = 1 except that 

their support is now in M"*" x {0}'’*“^. Following the same construction, the quantities TV{Pq, P\) and 
dpg ^{x) — dp^ j.(x) take the same values as in the case d = 1. We thus obtain the same lower bound as 
in the case d = 1. 


B.4 Proof of Theorem [2] 


Inequality ([T^ in the proof of Theorem is still valid. We can also use 
A for the case ^ Regarding the deviation bound on B, we restart 


the deviation bound 
from Inequality (20) 


( |19[ ) on 
and we 
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note that 



By definition of B, B and B 3 , and using Proposition in Appendixwe obtain that 


P(|A„_.(x)|>A) < p(A>-)+P(i?>- 


A 


A 


A 


< ^U>- +^’hB>- +PP3>- 


(27) 


where P (A > ^) + P has already been upper bounded in the Proof of Theorem We now 

upper bound the deviations of P 3 . For any 9^ G (0,1) to be chosen further, we have: 


2P3 < \ ^3-^ 


' Gr^ 


1 Tl 


- - G„, - 


n 


n 






We have P{B^ > ^) < P 




+ P 




where 


P 




< 2 exp 



+ 2 exp 




(28) 


The probability P ^P 4 > y ^ J can be upper bounded in two different ways: one using a concentration 
argument et one based on the Beta distribution of G“^. According to Proposition]^ we have 


P\B^>xl-\ < P(G 


< 4 exp 


-1 

n 


k\ k ^ 
- )- > F 

n J n 


A 1 \ _ k 

— — V m- 

2 03 n 


k\ 
— — I m- 


n 


2 k 


n J 


(29) 


Next, it is well known (see for instance p.97 in Chapter 3 of 
has a Beta(/c, n — k -\-l) distribution with density on [0,1]: 


Shorack and Wellner 


20091 that G"^ (^) 


t 1 -^ 


n\ 


-1) 


{k-l)\{n-k)\ 


n—k 
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Thus, for any f G (0,1), P 


Thus 




(1) > 1 - *) < L-i)*"-"*'- 


P Bi>< - ] < F Bi > , - 


n 


< p g-M - >P \ vfh 


A 1 


3 03 


< 


n 

k-l 






( 30 ) 


where the first inequality allows us to deal with a strict comparaison, which is necessary to rewrite the 
probability in terms of the cdf. Note that a similar bound can be obtained using Bennett’s inequality 
for P 4 . 

We now upper bound IE|A^ fe We only need to control the deviations of P 3 . Since P has a 

’ n 

moment of order r, for any t > 0 : 

t (1 - F{t)) = tP{\\x - X|r > t) < E||x - X|r =: C^^r- 
Then for any A > 0 (and n larger than 3): 


P I P 4 > A/ - 1 dA < 4 / exp 


< 4A exp 

< 4A exp 


n 


^ ^ _ k 
m - 


n 

-n? 


m - 

n 

k 

m - 

n 


d\ + 


n 

k-l/ ./A 


1 -P 


A^ 

3^ 


n—k-\-l 


+ 2 ’’ 


8A 

H- 

n 


4v/3C.,03 


-| n-k+l /■'» -n+fc-i 

J X 2 dX 

n—k-\-l 


/ 

/ k\ 


< exp 

- r { rn - 

2k \ n J 

+ exp 


Vx 

— {n — k + 1 ) log I 


f 4y/3Cx,r03 


\ Vx 


We choose A to balance the two terms inside the brackets: 


A = < iV 3 Cx,r 03 log 


n 


2k{n — k + 1) 


k\ 

m - 

n ) 


and then 


f 


P{Bs>-]dX < 


P\B4>\l-\dX+ I P(P5> 


f 


dX 


< a 


-A '^x.r.m 


kei 


+ 0 |exp 


n 


k 

m - 

n 


„ exp[^(m—-)^1 

2 _ n/ \ 


where Cx,r,m only depends on Cx,r and m. We thus take 6^ = - —-—I and we obtain 

j P ^P3 > dA < Cx,r,rnVk 


:exp 


n 

'Ik 


k 

m - 

n 


The deviation bound given in the Theorem derives from (27), (28), (29) and (30) with this value for 
03. 


dX 
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B.5 Proof of Theorem! 


We first recall the following Lemma from Chazal et al. (2011b|. 


Lemma 6 (Chazal et al. (2011b|). For any {x,y) G and any m G (0,1).’ 

\dp^rn,r{,x') dp^rn,r{y')\ ^ uW- 


Next Lemma directly derives from Lemma 

Lemma 7. For r = 1, the function x i—?• /S.n,m,i{x) is 1- Lipschitz on For r > 1, the function 
X I— )> r{x) is Cp^r -Lipschitz on the compact domain F where Cp^r depends on r and on the 

Hausdorff distance between F and the support of P. 


We give the proof of the Theorem for r = 1. The calculations are also valid r > 1 by replacing A 
by XCp^r in the probability bounds. The deviation bound of the Theorem can be proved with a simple 
union bound strategy. Up to enlarging the constant c, we can write 


Ff{F, A) < cA for any A < u)p{l). 

Now, for a given A < ijjp{l), there exists an integer N < c\~^ and N points (xi,... ,xm) laying in F 
such that |Ji=i N dd{xi, A) T F. For any point x G F, there exists a point ttx{x) of {xi,..., xtv} such 
that jjx — 7 r;,(x)i| < According to Lemmawe have 


fc i(x) i(7rA(x))| < -. 

’ n ’ ’ n ’ ^ 

According to Theorem we have for any k < ^ and any A > 0: 


/ M ^ ^ f lA2cA-^n A ifA<W25 l), 

\i=iN - 2y - \ 0 ifA>2wc(l), 


(31) 


(32) 


where 


□ (A) = exp ( — 


1 


/cA^ 




““ (b - rGi 


+ exp - 


kX 


16FT^(^) -TT^(O) 


+ exp 


n 


.-1 


ik > ' *■''* 


\ 


A f^Vk 

-ujp 2 — 
8 \ n 


+ exp - 


Vk X 


OJp 



= : □i(A) + □ 2 (A) + □ 3 (A) + □ 4 (A) + □ 5 (A) + □ 6 (A). 


Using 
term o: 


31 


and (32), we find that P ^sup^jg^? fc(^)l ^ A^ is also upper bounded by the right hand 

^ 2 i 

k 


We now integrate each term in A ^□(A). For the first one, let ak^n = ^ , - 1 / ^ \ —i 

( ri j Lc (0)J 

any Xk,n > 0 : 

POO POO 

/ 1 A 2cA“‘'Di(A)(iA < Xk,n + exp (-Q!fc,nA^) dA 

Jo J ^k,n 

■exp(-afc,nA|„). 


then for 


< Afe„ + 2c-^'=’” — ^ - '2 


^k,r 
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/log+([afc,„]"+®) 

We balance these two terms by taking = V-it gives: 


/■°° log+ 

/ 1 A 2cA-"ni(A)dA < A -^ 

Jo \ Otk,n 


( 33 ) 


The upper bound for the second term can be obtained in the same way. For the third term, let 
I3k,n ■= ^io-p Since is non decreasing, for any Xk^n > 0: 


roo roo 

/ 1A 2cA“'^n3(A)(iA < Xk,n + ‘JcX^n / 

Jo J ^k,n 




Ak 


dX 


< ^k,n + 2 cA 


k,n I exp 

^k,n 


-ly I °° „„„ / I {lJk,n\/^k,n) I _)^ I 

Ak ^ " 


\/ J'k^r 


^ ^fc,n T dc 


k 


'^k,n 


n 

2 r, ,-l /'»_ AT A A ^ ^ I Ak 


” (/ 3 fc,n\/Afc^)} 

We balance the two terms in the upper bounds by taking 

/ 


■j^^p (^(Jk,n a/ J^k,n^ ^ 


J^k,n — ^ 


1 


/Jk, 


-cap 


2 v/fe 


n 


V 


1 


log-* 


/ 


V 


1 2(,y-l)\ \ 


fJk,'^ 


tap 


2's/fe 


Indeed, we then obtain that: 

/ oo 

1 A 2cA ^□ 3 (A)(iA < Afc^„+ 


J^k,r 


log^ 


( 

(^k,n 

2T-i)\ 





1 

lJk,i 


-cap 


^ ^k,n H“ ^/c,n \ 


r / 

cap 

I V 


/ 

2\fk 

log+ 

IJk,n 

- 2(i.-l) j j 

n 

\ 

y cap 

(^) 

( 

2Vk 

/ 

log+ 

V 

fJk,n 

2 T-1)\ \ 1 

ijj 

n A 

V N 

tap ^ 

2vT\ 

n )\ 


-2(i.-l) 


cap 


2v(fc 


Pk, 


2T-1) 





/ 


/ 

p -1 

2{v-l)\ \ 


cap 

2 v/fc 

log+ 

fJk,n 




n A 


, . ( 2Vk\ 




\ 


V 

n )\ 

n 



-1 2T-1) 


^ -^fc,r 


where we have used log"'’ > 1 and the fact that tap (u) is non decreasing for the second inequality. Since 
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LOx>{u)/u is non increasing and log"'' > 1, we find that 

y/k 


h,n < k 


Uj} 


y/k 


n 


1 -1 





-I u-l\ 


(34) 


We proceed in the same way to show that the upper bound (34) is also valid for Ds for Dg. The bound 


in expectation given in the Theorem is of the order of the sum of the upper bounds (33) and (34) 


C Uniform empirical and quantile processes 

This section brings together known exponential inequalities for the uniform empirical process and of 
the uniform quantile process. These results can be found for instance in Chapter 11 of Shorack and 


Wellner (2009) 


Let ^ 1 ,.. ■ be n i.i.id. uniform random variables. The uniform empirical distribution function 
is defined by 


1 -v ^ 

Gn{t) = ^ ^ for 0 < t < 1. 


2=1 


The inverse uniform empirical distribution function is the function 

= inf{t | G„(t) > tt} for 0 < n < 1. 
Proposition 3. For any x and any n G N*; 


V 


Fx,n — Fx — Gn (U) — Fx 


and 


-1 
X 1 


F-i-F-^^F-^ (G-')-K 
where Fx and Fx^n ore defined in the Introduction Section. 

C.l Exponential inequalities for the uniform empirical process 

Let the function ‘h defined on M by 

f 2(A+l)[log(l+A) —1] if \ ^ 1 

$(A):=i ifA>-l, 

[ +00 otherwise. 

Next result is a point-wise exponential inequality for the deviations of the uniform empirical process 
{\/n [G,i(t) — t])^>Q- 


Proposition 4 (Inequality 1 and Proposition 1 in 
and any A > 0, we have 


Shorack and Wellner 


(2009)). For any 0 < t < to < h 


P {y/n\Gn{t)-t\> X) < 2exp|-^4>^ ^ 


ty/n 

, A2 / A 
< 2 exp — -— 4> 


< 2 exp — 


2to \tQy/n 
A 2 1 


)} 


2to 1 -|- 


A 


Stox/ii / 
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The first Inequality comes from Bennett’s Inequality and from the fact that nGn{t) follows a 
Binomial(n, t) distribution. The second Inequality derives from the fact that A i—)• A‘h(A) is a non 


decreasing function, see Point (9) of Proposition 1 p.441 in Shorack and Wellner (20091. The last 


inequality is Bernstein’s Inequality, it can be derived by upper bounding Bennett’s Inequality with the 


following result, see Point (10) of Proposition 1 p.441 in Shorack and Wellner (2009): 


$(A) > 


1 + - 


for any A G 


(35) 


The famous DKW inequality Dvoretzky et al. (1956) gives an universal exponential inequality for 


empirical processes. The tight constant comes from Massart (1990): 


Theorem 5. For any A > 0; 


P ( sup \/n\Gn{t) — t| > A ) <2exp (—2A^) . 

\te[o,i] J 

However, in the neighborhood of the origin, a tighter uniform exponential inequality can be given. 


Proposition 5 (Inequality 2 p. 444 in 
A > 0, 


Shorack and Wellner 


P sup ^/n 
\telo,to] 


Gn{t) - t 


1 -1 


> 


I — to 


(2009)). Let to G (0, |). Then, for any 

)} 



3topn , 


This local result directly derives from the fact that (| jg ^ martingale (Proposition 1 

V ^ ^ /o<t<i 


p.l33 in Shorack and Wellner (2009)). The second inequality directly derives from the previous one 
together with Inequality (35). 


C.2 Exponential inequalities for the uniform quantile process 


The general strategy followed in Shorack and Wellner (2009) to prove exponential inequalities for the 
uniform quantile process consists in rewriting inequalities on into inequalities on G„. For more 


details see for instance the proof of Inequality 2 p.415, or Lemma 1 p. 457 in Shorack and Wellner 
(2009). We introduce the function $ defined on M by 


$(A) := . ^ ^ 


1 + A V 1 + 

We give below a point-wise exponential bound for the uniform quantile process. 


Proposition 6 (Inequality 1 p. 453 in Shorack and Wellner (2009)). For all X > 0 and all 0 < u < 
uq < 1, we have 


P (\/n|G„^(u) - m| > A) < 2exp|-^4>^ 


A 

u^/n 




(—)] 

\uoy/nJ J 


< 2 exp — 


A 2 


2 uo 1 + 

3^0 V 72 . 
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The second Inequality derives from the property that A i—?■ A^>(A) is a nondecreasing function, see 


point (10) of Proposition 1 p.455 in Shorack and Wellner (2009). The last inequality comes from the 


following lower bound, see Point (12) of Proposition 1 in Shorack and Wellner (2009): 


$(A) > 


1 


1 + ^ 


for any A G 


(36) 


The following result is an uniform exponential inequality for the quantile process in the neighbor¬ 
hood of the origin. 


Proposition 7 (Inequality 2 p. 457 in 
for any A > 0 such that 


Shorack and Wellner 


(2009)). Let uo G (0, i) and n > 1. Then, 


A < Vn ( - - -uo ) , 


(37) 


we have 


f ^/n\G^^{u) -u\ A 
F sup - !—^^ > 


, te[o,Mo] 


1 — u 


1-uo 




(—)] 

\uoVnJ J 


< 2 exp — 


A 2 


1 


2uo 1 -h 


2A 

3uo^/n , 


This first Inequality comes from Proposition the second Inequality is deduced from the first one 
using (36). 


C.3 Le Cam’s Lemma 


The version of Le Cam’s Lemma given below is from Yu (1997). Recall that the total variation distance 


between two distributions Po and Pi on a measured space (Y, B) is defined by 

TV(Po, A) = sup \Po{B) - Pi{B)\. 

B&B 

Moreover, if Pq and Pi have densities pQ and pi with respect to the same measure A on Y, then 

TV(Po,Pi) = ^£i(po,Pi) := [ \po-pi\d\. 

^ Jx 

Lemma 8. Let V he a set of distributions. For P € V, let 9[P) take values in a metric space (X, p). 
Let Pq and Pi in V he any pair of distributions. Let Xi ,..., be drawn i.i.d. from some P G P. Let 
6 = 9{Xi ,..., Xn) be any estimator of 9{P), then 


sup Epr.p{9,9) > Ip (0(Po), 9{Pi)) [1 - TV(Po, Pi)] 
Pev o 


2n 
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